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Abstract — Dual decomposition is a powerful technique for de- 
riving decomposition schemes for convex optimization problems 
with separable structure. Although the Augmented Lagrangian 
is computationally more stable than the ordinary Lagrangian, 
the prox-term destroys the separability of the given problem. 
In this paper we use another approach to obtain a smooth 
Lagrangian, based on a smoothing technique developed by 
Nesterov, which preserves separability of the problem. With 
this approach we derive a new decomposition method, called 
proximal center algorithm, which from the viewpoint of efficiency 
estimates improves the bounds on the number of iterations of the 
classical dual gradient scheme by an order of magnitude. This 
can be achieved with the new decomposition algorithm since the 
resulting dual function has good smoothness properties and since 
we make use of the particular structure of the given problem. 

Index Terms — Smooth convex optimization, dual decomposi- 
tion, proximal center method, distributed control, distributed 
, network optimization. 

I. Introduction 

There has been considerable recent interest in parallel and 
^ distributed computation methods for solving large-scale opti- 

■ mization problems (e.g. 11]). For separable convex problems, 
' i.e. separable objective function but with coupling constraints 

■ (this type of problems arise in many fields of engineering: e.g. 
[networks flU, ||3], distributed model predictive control (MPC) 

p4l, fSl, stochastic programming {6\, etc), many researchers 
have proposed dual decomposition algorithms such as the 
• dual subgradient method ||T|, Q, alternating direction method 
fT|, fFl-fTOl, proximal method of multipliers [H], partial 
inverse method [12|, [131, etc. In general, these methods are 
'based on alternating minimization in a Gauss-Seidel fashion 
. of an (Augmented) Lagrangian followed by a steepest ascent 
' update for the multipliers. However, the step-size parameter 
. which has a very strong influence on the convergence rate 
' of these methods is very difficult to tune and also they do 
. not provide any complexity estimates for the general case 
(linear convergence is obtained e.g. under strong convexity 
assumptions). Moreover, these methods use the steepest ascent 
update for the multipliers, while we know from [ 14] that this 
update is inferior with one order of magnitude compared to 
Nesterov 's accelerated scheme. 

In this paper we propose a new decomposition method 
for separable convex optimization problems that overcomes 
the disadvantages mentioned above. Based on a smoothing 
technique recently developed by Nesterov in [15], we obtain 
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a smooth Lagrangian that preserves separability of the prob- 
lem. Using this smooth Lagrangian, we derive a new dual 
decomposition method in which the corresponding parameters 
are selected optimally and thus straightforward to tune. In 
contrast to the dual gradient update for the multipliers used 
by most of the decomposition methods from the literature, our 
method uses an optimal gradient-based scheme (see e.g. llT4l . 
[15]) for updating the multipliers. Therefore, we derive for 
the new method an efficiency estimate for the general case 
which improves with one order of magnitude the complexity 
of the classical dual gradient method (i.e. the steepest ascent 
update). Up to our knowledge these are the first efficiency 
estimate results of a dual decomposition method for separable 
non-strongly convex programs. The new algorithm is suitable 
for decomposition since it is highly parallelizable and thus it 
can be effectively implemented on parallel processors. This is 
a distinct feature of our method compared to alternating direc- 
tion methods based on Gauss-Seidel iterations that obviously 
do not share this advantage. 

This paper is organized as follows. Section contains 
the problem formulation, followed by a brief introduction of 
some of the existing dual decomposition methods and the 
description of an accelerated scheme for smooth minimization 
developed by Nesterov in 1X41 . flSl . The main results of the 
paper are presented in Section |III] where we describe our new 
decomposition method and its main properties, in particular 
global convergence. We conclude with some applications and 
preliminary computational results on some test problems in 
Section |IVl 

II. Preliminaries 

A. Decomposition methods for separable convex programs 

An important application of convex duality theory is in 
decomposition algorithms for solving large-scale problems but 
with special structure. One such example, that we also consider 
in this paper, is the following separable convex program: 

<l)2{z) : Ax + Bz = b}, (1) 



f * = min (a 



where 4>i : R™ R and 02 : K^* — R are continuous convex 
functions on X and Z, respectively, A is a given nxm matrix, 
_B is a given n x p matrix, and 6 is a given vector in R". 
In this paper we do not assume (pi and/or 02 to be strongly 
convex or smooth. Moreover, we assume that X C R™ and 
Z C RP are given compact convex sets. We also use different 
norms on R",R™ and MP, not necessarily the corresponding 
Euclidian norms. However, for simplicity in notation we do 
not use indices to specify the norms on R" , R™ and R^, since 



IEEE TRANSACTIONS ON AUTOMATIC CONTROL 



2 



from the context it will be clear in which EucUdian space we 
are and which norm we use. 

Remark 2.1: This type of problems ([U arises e.g. in the 
context of large-scale networks consisting of multiple agents 
with different objectives or in the area of distributed model 
predictive control (see also Section IIV) . 
We can also have any number M of agents with different 
0i's, not necessarily two agents. Moreover, the method 
developed in this paper can handle both coupling equalities 
{Ax + Bz=b) and/or inequalities [Ax + Bz < b). However, 
for simplicity of the exposition we restrict ourselves to dTJ. □ 

Let (■, •) denote the scalar product on the Euclidian space 
K". By forming the Lagrangian corresponding to the linear 
constraints (with the Lagrange multipliers A g M"), i.e. 
£o(x, z, A) = </)i(a;) + (j)2{z) + (A, Ax + Bz — b), and using 
the dual decomposition method, one arrives at the following 
decomposition algorithm: 

Algorithm 2.2: (1]], El) for A: > do 

1. given a'', minimize the Lagrangian (a;'^+^, 2*^+^) = 
argmin3;gx.zez 'Co(a;, z, A*"'), or equivalently minimize 
in parallel over x and z: 

x^+'^ = argmin[(/)i(x) + {\^,Ax)], 

z''+^ = argmin[02(2) + {X'',Bz)] 

2. update the multipliers: 



where Ck is a positive step-size. 

The following assumption is valid throughout the paper: 

Assumption 2.3: The set of optimal Lagrange multipliers 
A* is nonempty for problem □ 



It is known that Algorithm 12. 2l is convergent under Assump- 
tion 12.31 and the assumption that both 0i and 02 are strongly 
convex functions (the latter guarantees that the minimizer 
{x'^~^^ , z'^'^^) is unique). In fact, under the assumption of 
strong convexity, the dual function 

/o(A) = min (l3i{x)+(t)2{z) + {\Ax + Bz-b) 

is differentiable fl\, fl6\, and thus Algorithm 12.21 can be seen 
as the gradient method with step-size for maximizing the 
dual function. 

However, for many interesting problems, especially arising 
from transformations that leads to decomposition (see Section 
IIVI ). the functions (pi and (f)2 are not strongly convex. There 
are some methods (alternating direction method fW\, [9|, 
proximal point method [H], partial inverse method [12]) that 
overcome this difficulty based on e.g. alternating minimization 
in a Gauss-Seidel fashion of the Augmented Lagrangian, 
followed by a steepest ascent update of the multipliers. A 
computational drawback of these schemes is that the prox- 
term | \\Ax + Bz — & p, using the Euclidian norm framework, 
present in the Augmented Lagrangian is not separable in x 
and z. Another disadvantage is that they cannot deal with 



coupling inequalities in general. Moreover, these schemes 
were shown to be very sensitive to the value of the parameter 
c, with difficulties in practice to obtain the best convergence 
rate. Some heuristics for choosing c can be found in the 
literature fSl, f9l, {TV\. But, these heuristics have not been 
formally analyzed from the viewpoint of efficiency estimates 
for the general non-smooth case (linear convergence results 
were obtained e.g. for strongly convex functions). Note that 
alternating direction method variants which allow for inexact 
minimization were proposed in ifTOl . ifTTl . A closely related 
method is the partial inverse of a monotone operator developed 

in m, ma. 

B. An accelerated scheme for smooth convex maximization 

In this section we briefly describe an accelerated scheme 
that also uses only first-order information for smooth convex 
functions developed by Nesterov in 1141 . ifTSl . Let / be a 
concave and differentiable function on a closed convex set 
Q C R". We further assume that the gradient of this function 
is Lipschitz continuous: 

l|V/(a;)-V/(2/)l|, <L||x-2/|| Va;, y G Q, 
where ||s||* = max||^||<i(s, x) is the corresponding dual norm 
of the norm used on R" [TS], HTl- 

Definition 2.4: \\\5\ We define a prox-function d of the set 
Q as a function with the following properties: 

(i) d is continuous, strongly convex on Q with convexity 
parameter a, 

(ii) u° is the center of the set Q, i.e. — argmin^^g d{x) 
such that d{u°) = 0. 

The goal is to find an approximate solution to the smooth con- 
vex problem x* ~ argmax^^g /(x). In Nesterov's scheme 
three sequences of points from Q are updated recursively: 
{u''}k>o, {x''}k>o, and {v^}k>o. The algorithm can be de- 
scribed as follows: 
Algorithm 2.5: (fT5l) for fc > do 

1. compute /(m'^) and V f{u^) 

2. find f{x) where 



3. find = argmax^gg { - 

4. set u'^+i = T^^^ + '^'"^ 



,fe||2l 



'-d{x) 



The derivation of Algorithm 12.51 is based on the notion of 
estimate sequence. The main property of the estimate sequence 
corresponding to Algorithm 12. 5l is the following relation llT5l : 

(^i±ll(^/(x^-)>max{-^d(x) + 

4 xdQ O 



[/(7.') + (V/(u'),x-u')]}. (2) 



The convergence properties of Algorithm 12.51 are summarized 
in the following theorem: 

Theorem 2.6: IfTSlI Let sequence {x*'}k>o be generated by 
Algorithm 12.51 Then, {/(a;'°)}fc>o is nondecreasing and we 
have the following efficiency estimate: 

ALd{x*) 



fix*)- fix'') < 



aik + l)ik + 2)' 



IEEE TRANSACTIONS ON AUTOMATIC CONTROL 



3 



Theorem 12.61 tells us that from the viewpoint of efficiency 
estimates Nesterov's method applied to maximization of a 
concave function with Lipschitz continuous gradient has the 
order Oi^^j^^. Therefore, the efficiency of the method is 
higher by an order of magnitude than the corresponding 
pure gradient method (steepest ascent update with complexity 
0{\)) for the same smooth problem (see lfT4l '). Note that 
we can define directly — in step 2. The conclusions 
of Theorem 12.61 remain the same except that the sequence 
{f{x^)}k>o is not necessarily monotone. 

III. A NEW DECOMPOSITION METHOD BASED ON 
SMOOTHING THE LAGRANGIAN 

In this section we propose a new method to smooth the La- 
grangian of ([TJ, inspired from |15|. This smoothing technique 
preserves the separability of the problem and moreover the 
corresponding parameters are easy to tune. Since separability 
is preserved under this smoothing technique, we derive a 
new dual decomposition method in which the multipliers are 
updated according to Algorithm 12.51 Moreover, we obtain 
efficiency estimates for the new method for the general case 
and also global convergence. Note that with our method we can 
treat both coupling equalities Ax + Bz ~ b and/or inequalities 
Ax + Bz < b (see also Remark [ 



A. Smoothing the Lagrangian 

Let dx and dz be two prox-functions for the compact 
convex sets X and Z, with convexity parameter ax and 
az, respectively. Denote x'^ = aigmmxex d.x{x), z^ = 
argmin^g^ ^^(z). Since X and Z are compact and dx and 
dz are continuous, we can choose finite and positive constants 

Dx > maxdxfa;), Dz > maxd^fz). 

x£X zez 

We also introduce the following notation \\A\\ = 
maxpil^i (A, Ax). Since the linear operator A is 
defined as A : R" ^ R'J, where K'^ is the dual of R" (in 
fact M'J = M"), we have 

II All = max \\Ax\\^ and \\Ax\\^ < \\A\\\\x\\ V.t. 
||x||=i 

Similarly for B. Let us introduce the following family of 
functions: 

/c(A) = min [(j)i{x) + (t)2iz) + (\, Ax + Bz ~ b) + 

c{dx{x) + dziz))], (3) 

where c is a positive smoothness parameter that will be defined 
later in this section. Note that we could also choose different 
parameters ci and C2 for each prox-term. The generalization 
is straightforward. It is clear that the objective function in (|3]l 
is separable in x and z, i.e. 

/c(A) = -(A,6)+min[(?iii(x) + {\, Ax) + c dx{x)] + 

min[02(2) + (A, Bz) + c dz{z)]. (4) 

Denote by a;(A) and z(A) the optimal solution of the mini- 
mization problem in x and z, respectively. Function /c has 
the following smoothness properties: 



Theorem 3.1: The function fc is concave and continuously 
differentiable at any A G R". Moreover, its gradient \7fc{X) = 
Ax{X) + Bz{\) — 6 is Lipschitz continuous with Lipschitz 

1 1 ^ 1 1 2 1 1 S 1 1 

constant Lc = - — - — h - — —■ The following inequalities hold: 



/c(A) > /o(A) > /,(A) - c{Dx + Dz) yXe 



(5) 



Proof: Since the functions dx and dz are strongly 
convex, it follows that the optimal solution (a;(A),z(A)) of 
Q or (|4]l is unique for any A and thus the function fc is well 
defined at any A. Concavity and continuous differentiability of 
fc follows from standard duality theory lH], [[161 . It remains 
to show that its gradient V/c(A) = Ax{X) + Bz{X) — b is 
Lipschitz continuous. For simplicity of notation we assume 
that all the functions involved in the minimization problem Q 
are differentiable. Let A and rj be two Lagrange multipliers. 
Using first-order optimality conditions for the minimization 
problem in x we obtain: 

(VM^iX)) + A'^X + cVdx{xiX),x{r]) - x{X)) > 
{VMxiv)) + A^V + c^du{xiTj)),xiX) - x{t])) > 0. 

Adding these two inequalities and since (pi is convex and dx 
is strongly convex, we obtain 

{A^i7^~X),xiX)~x{7J)) > 

(V0i(x(A)) - V0i(x(r,)),x(A) - x{rj)) + 
c{VdxixiX)) - ydx{x{ri)), x{X) - x{i^)) > 
cax||x(A) 



From last relation and Cauchy-Schwartz inequality we have: 

\\Ax{X)-Axirj)\\l<\\Ar\\xiX)-xm^ 



< 



-{A' i^-X),x{X)-xiv)) 



cax 

<M^||A-^||||Ax(A)-Mry)|U, 



cax 



and thus \\Ax{X)- Ax{f^)\\^ < || A- 77 1|. Similarly, for the 
minimization problem in z we obtain: ||i?2;(A) — Bz{ri)\\^, < 
■i^^^llA — ?7||. In conclusion, the gradient of fc satisfies 



l|V/c(A) - V/, 



< 



■ll^ll^ 



\B\\ 



cax caz 

Furthermore, the first inequality in (|5) is a consequence 
of the fact that dx{x) > for all x, and dz{z) > 
for all z. The second inequality in (jSj follows from: 

fc{X) < mm^^x.zeziMx) + M^) + {X,Ax + Bz ~ b)] + 
cT[ia-Xxex,zez[dx{x) + dziz)]. □ 



B. A proximal center-based decomposition method 

In this section we derive a new dual decomposition method 
based on the smoothing technique described in Section IIII-AI 
The new algorithm, called here the proximal center algorithm, 
has the nice feature that the coordination between the agents 
involves the maximization of a smooth convex objective 
function (i.e. with Lipschitz continuous gradient). Moreover, 
the resource allocation stage consists in solving in parallel 
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two independent minimization problems with strongly convex 
objectives. The new method belongs to the class of two- 
level algorithms |18| and is particularly suitable for separable 
convex problems where the minimizations over x and z in (|4]i 
are easily carried out. 

We apply the accelerated method described in Algorithm 12.51 
to the concave function fc with Lipschitz continuous gradient: 



max fe(A), 



(6) 



where Q is a given closed convex set in M" that contains at 
least one optimal multiplier A* G A*. Notice that Q C M" 
for linear equalities (i.e. Ax + Bz — 6 = 0), Q C R" , where 
R+ denotes the set of nonnegative real numbers, for linear 
inequahties (i.e. Ax + Bz - b < Q), ov Q C (R"i x W^), 
where ni + n2 — n, when both, equalities and inequalities are 
present. Note that according to Algorithm 12. 5 1 we also need to 
choose a prox-function dq for the set Q with the convexity 
parameter ctq and center vP . The proximal center algorithm 
can be described as follows: 

Algorithm 3.2: for fc > do 

1. given u'^ compute in parallel 

x'^^^ = argmin[(/ii(a;) + {vf'.Ax) + c dx{x)] 



argmin[(/)2(^) + {u \Bz) + c dz{z)]. 



2. compute 

f,{u^)=Co{x''+\z^+\u'^) + c{dx{x^+^)+dz{z^+^)), 
\/fc{u'') = Ax^+^+Bz''+^ - b 

3. find A*"' — argmax;^g|-3^fc „fcj /c(A) where 



A'^ = arg max f,{u'')+{V f,{u''), X - u'^)- 



Lr 



k\\2 



\\X~u'\\ 



4. find 



arg max | 



-dgixy 



E^[/c(^') + (V/c(^.'),A-^.')]} 

1=0 



5. set m'^^+i = fi^A'^ 



k+3" ■ 



The proximal center algorithm is suitable for decomposition 
since it is highly parallelizable: the agents can solve in parallel 
their corresponding minimization problems. We now derive a 
lower bound for the value of the objective function which will 
be used frequently in the sequel: 

Lemma 3.3: For any A* e A* and x E X, z ^ Z, the 
following lower bound on primal gap holds: 

[Mi)+M^)]-r > -\\X*\U\\Ax + Bz-b\U. 

Proof: From our assumptions we have that 

/* ^ min [(j)i{x)+(j)2iz)- {Ax + Bz-b,X*)] 
< 01 (i) + (t>2iz) - {Ax + Bz-b, A*), 

and then using the Cauchy-Schwarz inequality, the result 
follows. □ 



The previous lemma shows that if \\Ax + Bz — & || * < Cc, then 
the primal gap is bounded: for all A G Q 

-ec\\X*\U<Mi) + M^')]-.f* < 

Ms^) + M^)- foW- (7) 

Therefore, if we are able to derive an upper bound e for the 
duality gap and Ec for the coupling constraints for some given 
A and x G X,z G Z, then we conclude that {x, z) is an 
(e, ec)-solution for problem ([T} (since in this case — ec||A* ||* < 
(t>i{x) + 02 (z) - /* < e for all A* G A*). The next theorem 
derives an upper bound on the duality gap for our method. 

Theorem 3.4: Assume that there exists a closed convex set 
Q that contains a A* G A*. Then, after fc iterations we obtain 
an approximate solution to the problem ([T) 



1=0 



(fc + l)(fc + 2) 



and A = A*^ which satisfy the following duality gap: 

[- 



[01 (i) + 02(^)] - /o(A) < ciDx + Dz)- 

Y)2 (^) + + Bz-b, A)] . (8) 



max 



Proof: For an arbitrary c, we have from the inequality (|2]l 
that after fc iterations the Lagrange multiplier A satisfies the 
following relation: 

(fc + l)(fc + 2) , r Lc , 

^ ^/,(A) > max{ - —dQ{X) + 

k 

In view of the previous inequality we have: 

4i, 



/c(A) >max{ 



AeQ 



;dQiX)- 



E 

1=0 



2(/ + l) 



(fc + l)(fc + 2) 



[/c(«') + (V/e(u'),A-u')]}. 



Now, we replace fc{u^) and Vfdu'') with the expressions 
given in step 2 of Algorithm 13.21 we obtain: for all A G Q 



E 

1=0 
k 



2{l + l) 



(fc + l)(fc + 2) 
2(^ + 1) 



. ffc + i 



(fc + l)(fc + 2) 



[fc{u') + {VfM),X~u')] > 
[(^x'+i + Sz'+i - 6, A) + 



1=0 

0i(a;'+i) + 02(2'+')] > {Ax + Bz - b, X) + MS:) + M^)- 

The first inequality follows from the fact that the prox- 
functions dx,dz > and the last inequality follows from 
convexity of the functions 0i and 02- Using the last relation 
and (|5]) we derive the bound (O on the duality gap. □ 

We now show how to construct the set Q and how to choose 
optimally the smoothness parameter c. In the next two sections 
we discuss two cases depending on the choices for Q and dg. 
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C. Efficiency estimates for compact Q 
Let Dq be a positive constant satisfying 

maxdQ(A) < Dq. 



(9) 



Let us note that we can choose Dq finite whenever Q is 
compact. In this section we speciaHze the resuh of Theorem 
13.41 for the case when Q has the following form: 



|A|l<i?}. 



Theorem 3.5: Assume that A* is bounded. Then, the se- 
quence {A''}a;>o generated by Algorithm 13. 2l is also bounded. 

Proof: Note that A* = {A : /o(A) > /*}. Let us 
inti-oduce the sets A'' {A : /o(A) > /* - c{Dx + Dz)} 
and A'^ = {A : /c(A) > /*}. From the inequalities in dSj 
it follows immediately that A* C A'' and also A* C A^. 
Therefore, the sets A^ and A*^ are nonempty. Since A* is 
bounded, from Corollary 8.7.1 in llT6l it follows that the set A° 
is also bounded. We can also show that A'^ is a bounded set. 
Indeed let A G A"^, then using once more the second inequality 
in © we obtain: /o(A) + c{Dx + Dz) > /c(A) > /*, i.e. 
A G A°. In conclusion, A^ C A° and thus A'^ is also bounded. 

Let us now show that the sequence {A'''}fc>o is bounded. 
From Theorem 12. 6 1 it follows that the sequence {/c(A'^)}fc>o is 
nondecreasing and thus {A'' : fc > 0} C {A : /c(A) > /c(A")}. 
But since A"^ is bounded, using once again Corollary 8.7.1 in 
fll6J it follows that the set {A : /c(A) > /c(A°)} is bounded. 



In conclusion, the sequence {A'^}fc>o is bounded. □ 

Since Assumption l2.3l holds. then A* is nonempty. Conditions 
under which A* is bounded can be found in [T6l (e.g. when 
the matrix [A B] has full rank). Under the assumptions of 
Theorem 13.51 it follows that there exists R > sufficiently 
large such that the set Q = {A G M" : ||A|| < R.} contain^ a 
A* e A*, and thus we can assume Dq to be finite. Notice that 
similar arguments were used in order to prove convergence of 
two-level algorithms for convex problems in [TSl. 

The next theorem shows how to choose optimally the 
smoothness parameter c and provides the complexity estimates 
of our method for the case when Q is a ball. 

Theorem 3.6: Assume that there exists R > Q such that the 
set Q = { A € R" : ||A|| < i?} con tains a A* e A*. Taking 



-j-r^ 1 , — ^=5— , I , , then after k iterations we 

fe+l y Ux+JJz yoQcrx dQcrz' 

obtain an approximate solution to the problem ([TJ (x, z) ~ 

Eto _(fc+i)tfc|2) (^'^'' ■^'^') and A - A'^ which satisfy the 
following bounds on the duality gap and constraints: 

[0i(x)+02(z)]-/o(A) < 



\B\\ 



\\Ax + Bz-h\\^ < 
{R-\\X*\\,){k + l)' 



IDq{Dx+Dz){ 



\B\ 



'in a practical algorithm R is increased adaptively: if A* approaches the 
boundary of Q we take Rk+i = ctRk for some a > 1. An upper bound on 



Proof: Using (|9]l and the form of Q we obtain that 

4L 

" dQ{X) + {Ax + Bz-b,X)> 



\eQ crQ(fc + 1)2 
4L,Dq 



max (Ax + Bz — 6, A) 

A||<i?/ 

R\\Ax + Bz-b\\^. 



(TQ{k + 1)2 ' ||Al<k 
_^LJJq_ 

In view of the previous relation and Theorem 13.41 we obtain 
the following duality gap: 

[MS:) + - ./o(A) < c{Dx + Dz) + _'^^fX2 - 



R\\Ax + Bz- b\\, < c{Dx + Dz) + 



<TQ{k + l)' 
^L,Dq 



<yQ{k + l) 



Minimizing the right-hand side of this inequality over c we 
get the above expressions for c and for the upper bound on 
the duality gap. Moreover, for the constraints using Lemma 
13.31 and inequality O we have that 



{R - \\X*\U)\\Ax + Bz-b\U< c{Dx + Dz) 



^L,Dq 
■Q{k + l)^ 



and replacing c derived above we also get the upper bound on 
the constraints violation. □ 

From Theorem 13.61 and inequality (|7} we obtain that the 
complexity for finding an (e, ec)-approximation of the optimal 
val ue function /*, when the set Q is a ball, is fc + 1 = 

A^Dq{Dx+Dz){^ + ^) i, i.e. the efficiency es- 
timates of our scheme is of the order 0{\), better than most 
non-smooth optimization schemes such as the subgradient 
method that have an efficiency estimate of the order 0(^) 
(see e.g. Ifl4l ). Moreover the dependence of the parameters 



c and Ln on e is as follows: c 



2{Dx+Dz) 



and Lr 



(^^^f — ^'az ) ^^tF^ • Another advantage of the proximal 
center method is that we are free in the choice of the norms 
in the spaces M", M™ and W , while most of the decomposition 
schemes are based on the Euclidian norm. Thus, we can choose 



the norms which make the ratio 



as small as possible. 



D. Efficiency estimates for the Euclidian norm 

In this section we assume that R" is endowed with the 
Euclidian norm. 

Theorem 3.7: Assume that Q = R" and dQ{X) = i||A||2 
with the Euclidian norm on R". Taking c = -p, — r-f=r- 

fc + 1 = 



2 I 

and 



IIBIP^ 



. ADx+Dz) i, then after fc 
iterations the duality gap is less than e and the constraints 
satisfy \\Ax + Bz-b\\ < e(||A*|| + ^||A*||2 + 2). 

Proof: Let us note that for these choices of Q and c?q we 
have (Tq = 1 and thus 



IIA* 



can also be estimated using R. 



max 

AeR" 

8ic 



4L 

jr^TT^dQiX) + {Ax + Bz-b,X) = 



\Ax + Bz-b\\ 
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In this case we obtain the following bound on the duality gap 
(see Theorem 13.4b : 

iMs^) + M^)] - MX) < 

c{Dx + Dz) - -^^-i-^llAi + Bz- 6f < c{Dx + Dz). 
O-Lc 

It follows that taking c 



Dx+D: 



-, the duality gap is less than 
e. For the constraints using Lemma 13.31 and inequality ^ we 
get that + — 6|| satisfies the second order inequality in 



(fc+i)^ 

8Le 



y -\\>'*\\y-^< 0. Therefore, \\Ax + Bz-b\\ must 
be less than the largest root of the corresponding second-order 
equation, i.e. 



\\Ax + Bz-h\\ < (||A*|| + W||A*||2 + 



€{k + iy 

2Lr 



With some straightforward computations we get that after k 
iterations, where k defined as in the theorem, we also get 

\\Ax + Bz-h\\ < e{\\X*\\ + V||A*||2 + 2). □ 



Remark 3.8: (i) When coupling inequalities Ax + Bz — 
& < are present, then we choose Q = R". Using the same 
reasoning as before we get that maxA>o ^^~jf^+iy2C?Q(A) + 

{Ax + Bz -b,\) = i^^\\[Ax + Bz~b] + f, where [y]+ 
denotes the projection of y e R" onto R" . Taking for c and 



k the same values as in Theorem 13.71 we can conclude that 
after k iterations the duality gap is less than e and using a 
modified version of Lemma |33] (i.e. a generalized version of 
Cauchy-Schwarz inequality: — (y, A) > — ||A|| for any 
y € R", A > 0) the constraints violation satisfy \\[Ax + Bz — 
&]+||<e(||A1| + VI|A1P + 2). 

(ii) Note that our algorithm can also deal with more general 
inequality constraints, e.g. sum of separable convex functions. 

Finally, let us mention that our decomposition method 
described in Algorithm 13.21 bears similarity with proximal 
type methods HI, El-llII], ||T3|, but is different both in the 
computational steps and in the choice of the parameter c. More 
precisely, our method uses a fixed center in the prox-terms that 
allows the inner iterations at each k to move freely, contrary 
to most proximal type methods that force the next iterates 
to be close to the previous ones. The main advantage of our 
scheme is that it is fully automatic, the parameter c is chosen 
unambiguously, which is crucial for justifying the convergence 
properties of Algorithm 13.21 



IV. Applications 
A. Applications with separable structure 

In this section we briefly discuss some of the applications 
to which our method can be applied. 

First application that we will discuss here is the control 
of large-scale systems with interacting subsystem dynamics. 
A distributed model predictive control (MPC) framework is 
appealing in this context since this framework allows us to 
design local controllers that take care of the interactions 
between different subsystems and physical constraints. We 



assume that the overall system model can be decomposed into 
M appropriate subsystem models: 



x'{k + l) 



A,jx' (fc) + B,ju^ (fc) Vi = 1 • • • M, 



where M{i) denotes the set of subsystems that interact with the 
ith subsystem, including itself. The control and state sequence 
must satisfy local constraints: x\k) G ili and u'(fc) G Ui for 
all i and A; > 0, where the sets Vli and Ui are usually convex 
compact sets with the origin in their interior In general the 
control objective is to steer the state of the system to origin or 
any other set point in a "best" way. Performance is expressed 
via a stage cost, which in this paper we assume to have the 
following form |4J: X)f=i where usually li is a 

convex quadratic function, not necessarily strictly convex. Let 
N denote the prediction horizon. In MPC we must solve at 
each step k, given x'^{k) — x"^ , an optimal control problem of 
the following form: 



N-l M 



uAn{}_^}^^^{x\,u\)■. x\€n,, u\€U,^l,i] (10) 



where xl. = x^ and x 



-Bijuf. A similar 



formulation of distributed MPC for coupled linear subsystems 
with decoupled costs was given in Q, but without state con- 
straints. Let us introduce Xj = {xq ■ ■ ■ x)^ u'q - ■ ■ u]v-i)i -^i — 
r^f +1 X Uf and the non-strictly convex quadratic functions 
'0i(x*) — '^fj^^ ^iix], u]). Then, the control problem (fTOt can 
be recast as a separable convex program: 



M 



M 



mm 



0}, 



(11) 



1=1 



where the C^'s and 7 are defined appropriately. 

Network optimization furnishes another areas in which our 
Algorithm 13.21 leads to a new method of solution. In this 
application the convex problem has the form |l2], Q: 



M 



M 



M 



min { ^ ^i[xi) : ^ CiXi -7 = 0, ^ DiXi - /? < O}, 

"^'^ ' ■ =1 i=l i=l 

(12) 



where Xi are compact convex sets (in general balls) in the 
Euclidian space R™, ipiS, are non-strictly convex functions 
and M denotes the number of agents in the network. 

In |4| the optimization problem (fTOl l (or equivalently (fTTt ') 
was solved in a decentralized fashion, iterating the Jacobi 
algorithm IT] Pmax times. But, there is no theoretical guarantee 
of the Jacobi algorithm about how good is the approximation 
of the optimum after Pmax iterations and moreover we need 
strictly convex functions to prove asymptotic convergence 
to the optimum. However, if we solve (fTOl i using Algorithm l3.2l 
(see |5| for more details), we have a guaranteed upper bound 
(see Theorem 13 . 6 1 or |3^ on the approximation of the optimum 
after pmax iterations. In |2|, |3| the optimization problem (fTZt 
is solved using the dual subgradient method described in the 
Algoi'ithm l2.2l Some preliminary simulation tests from Section 
IIV-BI show that Algorithm 13.21 is superior to Algorithm 12.21 
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Let us describe briefly the main ingredients of Algorithm 13. 2 1 
for problem (fT2l l. Let be properly chosen prox-functions, 
according to the structure of the sets X^'s and the norrrQ used 
on M'". Then, the smooth dual function fc has the form: 

M M 

MX) ^ min y^ ^pi{x^) + {Xi,y^ C\xi ~ a) + 

Xi^Xi ^ — ^ ^ — ^ 
i=l i=l 

M M 

1=1 1=1 

Moreover, Q C R"i x R"^, where ni and ?i2 denote the 
number of equality and inequality constraints. In this case all 
the minimization problems of Algoi'ithm l3.2l are decomposable 
in Xi and thus the agents can solve the optimization problem 
(fT2l l in a distributed fashion. 

B. Computational results 

We conclude this paper with the results of computational ex- 
periments on a random set of problems of the form (ITZt . where 
■0i's are convex quadratic functions (but not strictly convex) 
and Xi's are balls in M™ defined with the Euclidian norm. 
Similarly, R" (corresponding to the Lagrange multipliers) is 
endowed with the Euclidian norm (see Section IIII-Dl l. 



M = 2 M = 10 



m 


e 


PCM 


DSM 


PCM 


DSM 


50 


0.01 


202 


5000(0.05) 


811 


5000(0.29) 


200 


0.01 


625 


5000(0.19) 


2148 


5000(0.51) 


1000 


0.01 


890 


10000(0.34) 


3240 


10000(0.62) 


50 


0.001 


688 


5000(0.05) 


1898 


5000(0.29) 


200 


0.001 


1980 


5000(0.19) 


6237 


5000(0.51) 


1000 


0.001 


2926 


10000(0.34) 


7859 


10000(0.62) 



In the table we display the number of iterations of the 
Algorithm (DSM) and Algorithm [3j (PCM) for different 
values of m, M and of the accuracy e. For m — 50 and 
m = 200 the maximum number of iterations that we allow 
is 5000. For m = 1000 we iterate at most 10000 times. When 
maximum number of iterations is reached we also display 
between brackets the corresponding accuracy. We see that the 
duality gap is much better with our Algorithm 13.21 than with 
Algorithmic 

V. Conclusions 

A new decomposition method in convex programming is 
developed in this paper using the framework of dual de- 
composition. Our method combines the computationally non- 
expensive gradient method with the efficiency of structural 
optimization for solving separable convex programs. Although 
our method resembles proximal-based methods, it differs both 
in the computational steps and in the choice of the parameters. 
Contrary to most proximal-based methods that enforce the next 
iterates to be close to the previous ones, our method uses 

^For example if Xi = {x & : \\x — xq\\2 < K}, where || ■ || denotes 

\\x — X(\\\^ 

here the Euclidian norm, then it is natural to take dx^ix) = 2 

If Xi = {x & : x > 0, YT=i ^(i) = 1}' then the norm \\x\\ = 
'Y11L^ I ™d dx {x) = Inm + JZI^i ^(i) ^"^^(i) is more suitable (see 
1151 for more details). 



fixed centers in the prox-terms which leads to more freedom 
in the next iterates. Another advantage of our proximal center 
method is that it is fully automatic, i.e. the parameters of the 
scheme are chosen optimally, which are crucial for justifying 
its convergence properties. We also presented efficiency esti- 
mate results of the new method for general separable convex 
problems and proved global convergence. The computations on 
some test problems confirm that the proposed method works 
well in practice. 
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